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Abstract. Here, a separation theorem about Independent Subspace Analysis (ISA), a generalization of Independent 
Component Analysis (ICA) is proven. According to the theorem, ISA estimation can be executed in two steps under 
certain conditions. In the first step, 1-dimensional ICA estimation is executed. In the second step, optimal permutation 
of the ICA elements is searched for. We present sufficient conditions for the ISA Separation Theorem. Namely, we 
shall show that (i) elliptically symmetric sources, (ii) 2-dimensional sources invariant to 90° rotation, among others, 
satisfy the conditions of the theorem. 

1 Introduction 

'— 1 Independent Component Analysis (ICA) |1I2| aims to recover linearly or non-linearly mixed independent and hidden 
sources. There is a broad range of applications for ICA, such as blind source separation and blind source deconvolution 
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0, feature extraction @j, denoising jSJ. Particular applications include, e.g., the analysis of financial data [6J, data from 
neurobiology, fMRI, EEG, and MEG (see, e.g., |7l8j and references therein). For a recent review on ICA see [j]]. 

Original ICA algorithms are 1-dimensional in the sense that all sources are assumed to be independent real valued 
stochastic variables. However, applications where not all, but only certain groups of the sources are independent may 
have high relevance in practice. In this case, independent sources can be multi-dimensional. For example, consider the 
generalization of the cocktail-party problem, where independent groups of people are talking about independent topics, 
or that more than one group of musicians are playing at the party. The separation task requires an extension of ICA, 
which can be called Independent Subspace Analysis (ISA) or, alternatively, Multi-Dimensional Independent Component 
Analysis (MICA) jlOlllj . Throughout the paper, we shall use the former abbreviation. An important application for ISA 
is, e.g., the processing of EEG-fMRI data [12] . 

Efforts have been made to develop ISA algorithms |lflll2ll3ll4ll5llfitT7| . Related theoretical problems concern mostly 
jthe estimation of entropy or mutual information. In this context, entropy estimation by Edgeworth expansion has 
• »~j .been extended to more than 2 dimensions and has been used for clustering and mutual information testing |18| . /c-nearest 
rS 'neighbors and geodesic spanning trees methods have been applied in [^j and [El for the ISA problem. Other recent 
'approaches search for independent subspaces via kernel methods an d joint block diagonalization [T7] . 

An important observation of previous computer studies |10I19| is that general ISA solver algorithms are not more 
efficient, in fact, sometimes produce lower quality results than simple ICA algorithm superimposed with searches for 
the optimal permutation of the components. This observation led to the present theoretical work and to some computer 
studies that have been published elsewhere |2()j . 

This technical report is constructed as follows: In Section[2]the ISA task is described. SectionEJcontains our separation 
theorem for the ISA task. Sufficient conditions for the theorem are provided in Section 31 Conclusions are drawn in 
Sectional 



2 The ISA Model 

2.1 The ISA Equations 

The generative model of mixed independent multi-dimensional sources (Independent Subspace Analysis, ISA) is the 
following. We assume that there are M pieces of hidden d-dimensional sources (components): s m (m = 1, . . . , M). The 



linear transformation 



z = As (1) 

of their concatenated form 

s:=[s';...;s M ] (2) 

is available for observation only. Here, the total dimension of the sources is D :— d ■ M and thus, s S JR 15 , A g Wi Dy D and 
z € III d . In what follows, we shall assume that mixing matrix A is invertible. The ISA task is to estimate the unknown 
matrix A (or its inverse, the so-called separation matrix W) and the original sources by means of the observations z(i). 
The special case of d = 1 corresponds to the ICA task. 



2.2 The Whiteness Assumption and its Consequences 

Given our assumption on the invertibility of matrix A, we can assume without any loss of generality that both the sources 
and the observation are white, that is, 

E[s] = 0,E[ss T ] =I D , (3) 
E[z] =0,£[zz T ] =I D , (4) 

where superscript T denotes transposition, Id is the -D-dimensional identity matrix, E[-] denotes the expectation value 
operator. It then follows that the mixing matrix A and thus the separation matrix W = A -1 are orthogonal: 

I D = E [zz T ] = AE [ss T ] A T = AI D A T = AA T . (5) 

The ambiguity of the ISA task is decreased by Eqs. J31)-I@J: Now, sources are determined up to permutation and orthogonal 
transformation of the subspaces belonging to the s m sources. For more details on this subject, see |2l] . 



2.3 The ISA Cost Function 

The ISA task can be viewed as the minimization of mutual information between the estimated components: 

min /(yV..,y M ) (6) 

WgO D 

where y = Wz, y = [y 1 ; . . . ; y M ] and O d denotes the space of the D x D orthogonal matrices. This cost function / is 
equivalent to the minimization of the sum of d-dimensional entropies, because 

M 



l(y\...,y M ) = J2H(y m )-H(y) (7) 

m— 1 

M 

= J2H(y m )-H(Wz) (8) 

771=1 

M 

= J2 H (y m ) (ff(z) + m(|det(W)|). (9) 



Here, H is Shannon's (multi-dimensional) differential entropy defined with logarithm of base e, |-| denotes absolute 
value, 'det' stands for determinant. In the second equality, the y = Wz relation was exploited, and the 

if(Wz) =#(z)+ln(|det(W)|) (10) 

rule describing transformation of the differential entropy [22] was used. det(W) = 1 because of the orthogonality of W, 
so ln(|det(W)|) = 0. The H(z) term of the cost is constant in W, therefore the ISA task is equivalent to the minimization 
of the cost function 

M 



3 The ISA Separation Theorem 

The main result of this work is that the ISA task may be accomplished in two steps under certain conditions. In the first 
step ICA is executed. The second step is search for the optimal permutation of the ICA components. 
First, consider the so called Entropy Power Inequality (EPI) 

e 2ff(Eti«*)>£y*(«0 (12) 
<=1 

where u±, . . . ,ul eR denote continuous stochastic variables. This inequality holds for example, for independent contin- 
uous variables [22 . 

Let ||-|| denote the Euclidean norm. That is, for w G IR L 

L 

||w|| 2 :=$>?, (13) 

i=l 

where tOj is the i th coordinate of vector w. The surface of the unit sphere in L dimensions shall be denoted by S L : 

S L := {w G H L : ||w|| = 1}. (14) 
If EPI is satisfied (on S L ) then a further inequality holds: 

Lemma 1. Suppose that continuous stochastic variables u\,...,Ul G IR satisfy the following inequality 

e 2i?(Ef =1 ^»0 >^ e ^.«.) ]Vw e S L . (15) 

i=l 

This inequality will be called the w-EPI condition. Then 

H (j2 w * u i) ^ E w i H M , Vw G S L . (16) 

\i=l / i=l 

Note 1. w-EPI holds, for example, for independent variables Ui, because independence is not affected by multiplication 
with a constant. 

Proof. Assume that w G S L . Applying In on condition 115L and using the monotonicity of the In function, we can see 
that the first inequality is valid in the following inequality chain 

2H (X>J >\n(j2e 2H ^ u A - In ■ vA > £ ■ In (e 2 ^) = W \ ■ 2H(u t ). (17) 

\i=l / / \i=l / i=l i=l 

Then, 

1. we used the relation |22] : 

H(w l u i ) = H(u i )+\n(\w l \) (18) 
for the entropy of the transformed variable. Hence 

e 2H(w %Ui ) _ e 2H(u % )+2\n(\w t \) = e 2H(u % ) _ e 21n(K|) = g2H(ui) . ^2_ ^ lg ) 

2. In the second inequality, we utilized the concavity of In. □ 
Now we shall use Lemma to proceed. The separation theorem will be a corollary of the following claim: 

Proposition 1. Let y = [y 1 ; . . . ;y A/ ] = y(W) = Ws, where W G O d , y m is the estimation of the m th component of 
the ISA task. Let y™ be the i th coordinate of the m th component. Similarly, let sf stand for the i th coordinate of the m th 
source. Let us assume that the s TO sources satisfy condition l|lfijl . Then 

M d M d 

EE^Q/mEE^")- (20) 



Proof. Let us denote the element of matrix W by Wij. Coordinates of y and s will be denoted by yi and Si, 

respectively. Further, let S 1 , . . . , S M denote the indices of the 1 st , . . . , M th subspaces, i.e., S 1 := {1, . . . , d}, . . . , S := 
{D — d + 1, . . . , D}. Now, writing the elements of the i th row of matrix multiplication y = Ws, we have 



Vi = E w ij*3 + ■■■+ E W ^ s 3 
jes 1 jeS M 



and thus, 

B(Vi) 



(21) 



' E W Wi 



H \ E^ 



vies 1 



/ 



v/es 1 



Ejss 1 Wi 'i s : 
^ies 1 ^,i s J 



r + -.+ ( E W t 



Wi 



\ieS M 



\ 



/ 



V" esl fees 1 w; 



> E^ E 



h3 



v/es 1 



^s 1 \ E Iesl w£ 




Y W u) H 

v ieS M 



E 



E^l E 

v ieS M 



TV, 



(22) 
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H( Sj ) (26) 



(27) 



- E 11 -V 7 + • • • + E 11 -V fe) 

jes 1 ieS M 
The above steps can be justified as follows: 

1. (12:21 : Eq. 12111 was inserted into the argument of H. 

2. (l23l) : New terms were added for Lemma Q 

3. (12411 : Sources s m are independent of each other and this independence is preserved upon mixing within the subspaces, 
and we could also use Lemma because W is an orthogonal matrix. 

4. (12.~)t : Nominators were transferred into the J2j terms. 

5. (121it : Variables s m satisfy condition 111 (it according to our assumptions. 

6. (1271) : We simplified the expression after squaring. 

Using this inequality, summing it for i, exchanging the order of the sums, and making use of the orthogonality of matrix 
W, we have 
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= E^i). 
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(28) 
(29) 
(30) 



Note 2. The proof holds for subspaces with different dimensions. This is also true for the following theorem. 



Having this proposition, now we present our main theorem. 

Theorem 1 (Separation Theorem for ISA). Presume that the s m sources of the ISA model satisfy condition illfijl . 
and that the ICA cost function J(W) = Ylm=i 2<=i H{y™) has a minimum W g D . Then it is sufficient to search for 
the minimum of the ISA task as a permutation of the solution of the ICA task. Using the concept of separation matrices, 
it is sufficient to explore forms 

W ISA = PW ICA , (31) 
where P (e M £>X ' D ) is a permutation matrix to be determined. 

Proof. ICA minimizes the l.h.s. of Eq. 120(1 . that is, it minimizes 53 m =i X)i=i H (v? 1 )- The set of minima is invariant 
to permutations and to changes of the signs. Also, according to Proposition {s™}, i.e., the s™ 1 coordinates of the 
components of the solution of the ISA task belong to the set of the minima. 

4 Sufficient Conditions of the Separation Theorem 

In the separation theorem, we assumed that relation iTTfijl is fulfilled for the s m sources. Here, we shall provide sufficient 
conditions when this inequality is fulfilled. 

4.1 w-EPI 

According to LemmaQl if the w-EPI property [i.e., (TT^J] holds for sources s m , then inequality ijTfil) holds, too. 

4.2 Elliptically Symmetric Sources 

A stochastic variable is elliptically symmetric, or elliptical, for short, if its density function - which exists under mild 
conditions - is constant on elliptic surfaces. 1 We shall show that iTTfijl as well as the stronger lTT5)) w-EPI relations are 
fulfilled. We need certain definitions and some basic features to prove the above statement. Thus, below we shall elaborate 
on spherical (spherically symmetric) and elliptically symmetric stochastic variables ;23 2l\. 

Basic Definitions 

Definition 1. (Characteristic function) The characteristic function of stochastic variable v e TR d is defined by the map- 
ping 

TR d 3 t i-> (p v (t) := £[exp(it T v)], (32) 

where i — and exp is the exponential function. 

Spherically symmetric variables can be introduced in different ways that, together, provide the view that we need 
here. 

Definition 2 (Spherically symmetric variable around /x). A stochastic variable v G H d is called spherically sym- 
metric around fx, if: 

1. its density function is not modified by any rotation around fx. Formally, if 

v-/x d = r O(v-/x), VOeO d , (33) 

where d = r denotes equality in distribution. 

2. its characteristic function with some 4> : [0, oo) — * 1R assumes the following form 

^v- M (t) = 4> (t T t) . (34) 

Function <fi is called the characteristic generator of v. 
1 They are often called elliptically contoured stochastic variables. 



3. it has the following stochastic representation 

v^V + ruW, (35) 



where 

(a) (J, € TR d : is a constant vector, 

(b) u^: is a stochastic variable of uniform distribution over S d , 

(c) r: is a non-negative scalar stochastic variable, which is independent of vS d ' . 

We shall make use of the following well-known property of spherically symmetric variables: 

Proposition 2. Let v denote a d- dimensional variable, which is spherically symmetric around zx. Then the projection of 
v /x onto lines through the origin have identical univariate distribution. 

Affine transforms of spherically symmetric variables take us to the concept of elliptically symmetric variables. We shall 
be interested in the case, when the affine transformation is bijective. Then the following definitions are equivalent: 

Definition 3 (Elliptically symmetric variable around /x). A stochastic variable e € lR d is called elliptically sym- 
metric around fx, if: 

1. there exists /x S H d and an invertible A g IR dx d such that 

e = [i + Aw, (36) 

where v is a d-dimensional stochastic variable, which is spherically symmetric around 0. In this case, the characteristic 
function of e is 

p e (t) = cxp (it T fi) (f> v (t T Zt) , (37) 

where £ := AA T and <fi v is the characteristic function of v. 

2. there exists vector /x € JR d , positive definite symmetric matrix £ <G TR dxd , and function <j> : [0, oo) — ► 1R such, that the 
characteristic function of e — fi is 

<Pe-n(t) =4>{t T Et) . (38) 

This property will be denoted ase~ Ed{^, £,4>)- 4> w ^ be called the characteristic generator of variable e. 

3. e has stochastic representation of the form 



' • vtu *' (39) 



e = fi 



where A € TR dxd is an invertible matrix ana 



(a) //£ TR d : is a constant vector, 

(b) u( d h stochastic variable with uniform distribution on S d , 

(c) r: non-negative scalar stochastic variable, which is independent from u^ d '. 

Here: fi, S, and r are called the location vector, the dispersion matrix, and the generating variate, respectively. 

Basic Properties Here, we list important properties of an elliptic variable e ~ Ed(fi, S, <p). 
1. Density function: if e has a density function, then it assumes the form 

/ e (x) = |Ar^g((x- At ) T yl- 1 (x- At )), x^/x (40) 

where 

/•no ^ 

L-ti- x g{t)dt = 1 (41) 



lo r(i) 

and g : [0, oo) — > IR is a non-negative function. Here, r denotes the gamma function defined as 



/'CO 

r(a) := t a cxp{-t)dt (a>0). 
Jo 



(42) 



One can show that condition l(4*T)l on g is necessary and sufficient for making ijltlj) a density function. For the existence 
of the density function it is sufficient if variable r is absolutely continuous. Then function g has an explicit form, see 



2. Momenta: we consider the expectation value and the variance 

(43) 

of variable e. They exist iff the respective momenta of r are finite. Then, supposing that E [r 2 ] is finite, we have 



Var[e] := E (e - E[e}) (e - E[e]) T 



E[e] = fi (44) 
Var[e] = -j 1 ^ = -^'(0)27. (45) 

In what follows, we assume that E [r 2 ] is finite. 
Elliptical Sources Now we are ready to claim the following theorem. 

Proposition 3. Elliptical sources s m (m = 1, . . . , M) with finite covariances satisfy condition ljlfi|) of the ISA separation 
theorem. Further, they satisfy w-EPI (with equality). 

Proof. Here, we show that the w-EPI property is fulfilled with equality. Let s™ 1 ~ E d (fi m , S m , 4> m ) (m = 1, ...,M) 
denote elliptical sources. Let us normalize each of them as 

y^(S m )-^(y-^ m ). (46) 

So, it is satisfactory to prove this proposition for spherically symmetric sources. In what follows, s m denotes these 
spherically symmetric sources. According to H44J1 -H45 |1 . spherically symmetric sources s m have zero expectation values 
and up to a constant multiplier they also have identity covariance matrices: 

E[s m ] = 0, (47) 
Var[s m ] = c m ■ l d . (48) 

Note that our constraint on the ISA task, namely that covariance matrices of the s m sources should be equal to Id, is 
fulfilled up to constant multipliers. 

Let P w denote the projection to straight line with direction w d S d , which crosses the origin, i.e., 

d 

P w :M d 3 u^Y^ w i u i e M - ( 49 ) 

8=1 

In particular, if w is chosen as the canonical basis vector (all components are 0, except the i th component, which 
is equal to 1), then 

P e ,(u) = Ui. (50) 

In this interpretation, (1 1 5 1 and w-EPI are concerned with the entropies of the projections of the different sources onto 
straight lines crossing the origin. The l.h.s. projects to w, whereas the r.h.s. projects to the canonical basis vectors. Let 
u denote an arbitrary source, i.e., u :— s m . According to Proposition [3 distribution of the spherical u is the same for all 
such projections and thus its entropy is identical. That is, 

d 

Edistr distr distr v, n( j /^-,\ 

Wiin = u\ = ... = ua, Vw e S , (51) 

i=i 

H^J2w iU ^j =H{ Ul ) = ... = H(u d ), VweS d . (52) 

Thus: 

- l.h.s. of w-EPI: e 2H ( Ul \ 



- r.h.s. of w-EPI: d d d 

J- JB&i«i) = e 2H ^ ■ wl = e 2H ^ w ? = e2H(Ul) ■ 1 = e2H(Ul) ( 53 ) 

i— 1 i— 1 i— 1 

At the first step, we used identity iflfll) for each of the terms. At the second step, 1(521) was utilized. Then term e H ( U1 ^ 
was pulled out and we took into account that w £ S d . 

□ 

Note 3. We note that sources of spherically symmetric distribution have already been used in the context of ISA in 
jllj . In that work, a generative model was assumed. According to the assumption, the distribution of the norms of 
sample projections to the subspaces were independent. This way, the task was restricted to spherically symmetric source 
distributions, which is a special case of the general ISA task. 



4.3 Sources Invariant to 90° Rotation 

In the previous section, we have seen that the case of elliptical s m sources can be reduced to the spherical case 2 , and that 
spherical variables are invariant to orthogonal transformations [see Eq. For mixtures of 2-dimensional components 

(d = 2), much milder condition, invariance to 90° rotation, suffices. First, we observe that: 

Note 4- In the ISA separation theorem, it is sufficient if some orthogonal transformation of the s m sources, C m s m 
(C m £ O d ) satisfy the condition itTfi)) . In this case, the C m s m variables are extracted by the permutation search after the 
ICA transformation. Because the ISA identification has ambiguities up to orthogonal transformation in the respective 
subspaces, this is suitable. In other words, for the ISA identification the existence of an Orthonormal Basis (ONB) for 
each u := s™ 1 £ TR d components is sufficient, on which the 

h : M d 3 w ^ H[{w, u)] (54) 

function takes its minimum. (Here, the (w, u) := Ylt=i w i u i stochastic variable is the projection of u to the direction w.) 
In this case, the entropy inequality l(T6|l is met with equality on the elements of the ONB. 

Now we present our theorem concerning to the d = 2 case. 

Theorem 2. Let us suppose, that the density function f of stochastic variable u = (u\,U2){= s m ) £ K 2 exhibits the 
invariance 

f(ui,U2) = f(-U2,Ul) = f(-Ul,-U2)=f(U2,-Ul) (Vu £ 1R 2 ) , (55) 

that is, it is invariant to 90° rotation. If function h(w) = H[{w, u)] has minimum on the set {w > 0} n S 2 , it also has 
minimum on an ONB. 3 Consequently, the ISA task can be identified by the use of the separation theorem. 

Proof. Let 



R 



-1 

1 



(56) 



denote the matrix of 90° ccw rotation. Let w 6 S 2 . (w, u) 6 IR is the projection of variable u onto w. The value of the 
density function of the stochastic variable (w, u) in t £ IR (we move t in direction w) can be calculated by integration 
starting from the point wi, in direction perpendicular to w 

/y=jf(w)=<w,u) (*) = / /(wi + z)dz. (57) 

Using the supposed invariance of / and the relation 15711 we have 

fy(w) — /j(Rw) = /y(R 2 w) = fy(-R 3 w), (58) 

where '=' denotes the equality of functions. Consequently, it is enough to optimize h on the set {w > 0}. Let w min be 
the minimum of function h on the set S 2 n {w > 0}. According to Eq. Q58H ■ h takes constant and minimal values in the 

points. {v m i„,Rv m i„} is a suitable ONB in NoteHJ □ 

2 Non-singular affine transformation can be freely performed on the sources because of the detailed ambiguities of the ISA task. 

3 Relation w > concerns each coordinates. 



Note 5. A special case of the requirement H55J1 is invariance to permutation and sign changes, that is 



/(±Ui,±U2)=/(±u 2 ,±«i). (59) 
In other words, there exists a function g : K 2 — > H, which is symmetric in its variables and 

f(u)=g(\ Ul \,\u 2 \). (60) 

The domain of the theorem includes 

1. the formerly presented spherical variables, 

2. or more generally, variables with density function of the form 

f(u) = g (j2\ Ui \^j (p>0). (61) 

In the literature essentially these variables are called L p -norm sphericals (for p > 1). Here, we use the L p -norm 
spherical denomination in a slightly extended way, for p > 0. 

4.4 Takano's Dependency Criterion 

We have seen that the w-EPI property is sufficient for the ISA separation theorem. In |25j . sufficient condition is provided 
to satisfy the EPI condition. The condition is based on the dependencies of the variables and it concerns the 2-dimensional 
case. The constraint of d ~ 2 may be generalized to higher dimensions. We are not aware of such generalizations. 

We note, however, that w-EPI requires that EPI be satisfied on the surface of the unit sphere. Thus it is satisfactory 
to consider the intersection of the conditions detailed in on surface of the unit sphere. 



4.5 Summary of Sufficient Conditions 

Here, we summarize the presented sufficient conditions of the ISA separation theorem. We have proven, that the require- 
ment described by Eq. Ill (ill for the s m sources is sufficient for the theorem. This holds if the 115t w-EPI condition is 
fulfilled. The stronger w-EPI is valid for 

1. sources satisfying Takano's weak dependency criterion, 

2. spherical sources (with equality), 

3. sources invariant to 90° rotation (for d = 2). Specially, (i) variables invariant to permutation and sign changes, and 
(ii)L p -norm spherical variables belong to this family. 

These results are summarized schematically in Table 



5 Conclusions 

In this paper a separation theorem was presented for the Independent Subspace Analysis (ISA) problem. If the conditions 
of the theorem are satisfied then the ISA task can be solved in 2 steps. The first step is concerned with the search 
for 1-dimensional independent components. The second step corresponds to a combinatorial problem, the search for the 
optimal permutation. We have shown that elliptically symmetric sources satisfy the conditions of the theorem. In case 
of 2-dimensional sources (d = 2) invariance to 90° rotation, or the Takano's dependency criterion is sufficient for the 
separation. 

These results underline our experiences that the presented 2 step procedure for solving the ISA task may produce 
higher quality subspaces than sophisticated search algorithms |15j . 

Finally we mention that the possibility of this two step procedure was first noted in QH] . 



Table 1. Sufficient conditions for the separation theorem. 



Takano's dependency 
(d = 2) 



invariance to 90° rotation (d = 2) 



specially 



(with = for a suitable ONB) 



invariance to sign and permutation 

specially 

L p -norm spherical (p > 0) 
t 

generalization for d — 2 



(with = for all w S S a ) 

=> w-EPI < spherical symmetry (or elliptical) 



Equation 11611 : sufficient 
for the Separation Theorem 
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